10 - Introduction to Machine Learning [ID:43114]
50 von 744 angezeigt

Okay, welcome everyone.

Let's start as usual with a small quiz.

Okay, some more taking pictures.

So the first question, I mean you can already start, it should be available.

So the first question is about speech features.

So we talked a little bit about speech production and how to produce speech features.

We learned that there are different categories, so more looking on these short spectral features

and this is where MFCCs come into place.

And so the first question of this quiz is to order the algorithm in the correct way.

So some more people arriving, that's great.

Okay.

Good, so we have some votes I would say in interest of time.

We will see what you got.

And I think I can also make that bigger here.

But then you don't see, okay, I guess it's enough in that way.

So some people didn't get it right, but that is not so, so it's a difficult question.

I mean you have to order quite some parts here.

So let's go maybe back to the original order here.

So first, in the first stage you have to take, you have your speech signal and you have to

take some overlapping windows.

And typical window sizes are between 10 and 30 milliseconds, so 20 or 25.

It's a quite usual window of these MFCCs.

So afterwards you apply a hemming window, so this is stage B, because you want to apply,

because of the Fourier transform, it's only applicable if you have periodic spectrum,

so periodic samples.

And in this way you can at least mitigate this problem.

And afterwards you apply the power spectrum or Zepstrom, so this is a little bit dependent

on the algorithm you're using specifically.

I think the original version, it used just the power spectrum and more modern versions

use the Zepstrom.

Here you apply then directly the DFT on these short windows.

Afterwards you integrate over so-called mal-scaled bands, and this is just you have now your

frequencies and you basically average different bands together in the certain, so as these

mal-scaled filter bands are denoted.

So some will go only, I don't know, from 8,000 to 10,000 hertz and over those you average

them or well you filter with the coefficients of this mal-filter bank.

And afterwards you apply a DCT.

This is for de-correlating the signal even further and to get rid of these, I mean you

still have overlapping in the frequencies, also these mal-frequency filter bank, you

overlaps and to take a relate this further you apply this DCT and this is

basically then the last step and you get out the MFCCs. Then typically what you do

is also that you apply first and second order derivatives and put

them also as additional features because they typically depend on previous

signals. Yeah, this was the first question, a little bit about MFCCs.

Let's continue to the second one and I think to click here. So it's a rather

straightforward question I would say. So we learned several image-based features

and some of them are based on gradient orientations and maybe you remember

especially one hazard even in his name. This is why I abbreviated the

algorithms here but maybe you remember what that was.

Okay, seems to be not so difficult for you. Already 11 people voted. Maybe in

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:30:45 Min

Aufnahmedatum

2022-07-01

Hochgeladen am

2022-07-01 18:19:05

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen